Data pruning using confidence measures for concatenative synthesis system built using automatically transcribed audio
نویسندگان
چکیده
Today, we can record and store large amounts of single speaker audio data, and also download it from the web. Generally, these data are prosodically rich and can therefore act as excellent candidates for building concatenative text-to-speech (TTS) systems. But transcritpions for these audio data are often not available and automatic transcriptions are error prone. In addition, these audio data contain bad acoustic (poorly articulated, noisy, inaudible, unintelligible, clipped) regions. Both above reasons can damage the resulting synthesized voice. So, pruning bad data becomes necessary. In this paper, we describe the development of two concatenative TTS systems using a lecture speech downloaded from Coursera and an audiobook downloaded from Librivox. Confidence measures such as phone posterior probability and unit duration obtained from the ASR system are used to remove bad data. Voices built using automatic transcripts are compared with those built using reference transcripts, and the effect of data pruning is investigated in terms of intelligibility and naturalness with the help of perceptual evaluation on Blizzard 2013 test corpus.
منابع مشابه
Speech recognition based confidence measures for building voices from untranscribed speech
Today, large amount of audio data is available on the web in the form of audiobooks, podcasts, video lectures, video blogs, news bulletins. In addition, we can effortlessly record and store audio data such as read/lecture/impromptu speech on hand-held devices. These data are rich in prosody, provide a plethora of voices to choose from, and their availability can significantly reduce the overhea...
متن کاملPrioritizing Audio Features Selection Using Analysis Hierarchy Process As A Mean To Extend User Control In Concatenative Sound Synthesis
User control is one of the most important heuristic principles of a system design as it gives users the freedom to choose a system’s functions and as a mean of communicating instructions to the system before performing a specific task. Existing concatenative sound synthesis systems call the need for a more flexible user control function, in particular during feature selection. This paper studie...
متن کاملAn auditory-based distortion measure with application to concatenative speech synthesis
This study presents a new auditory-based distance measure with application to concatenative speech synthesis. This measure employs the Carney auditory model to produce a feature vector related to auditory perception. For concatenative synthesis, the new measure is employed to assess perceived discontinuities at segment transitions. Evaluations using a restricted data base environment show that ...
متن کاملEarGram: an Application for Interactive Exploration of Large Databases of Audio Snippets for Creative Purposes
This paper outlines the creative and technical considerations behind earGram, an application built as a Pure Data patch for real-time concatenative sound synthesis. The system encompasses four generative strategies that automatically re-arrange and explore a database of descriptor-analyzed sound snippets (corpus) by rules other than its original temporal order into musically coherent outputs. O...
متن کاملSpatializing Timbre With Corpus-Based Concatenative Synthesis
Corpus-based concatenative synthesis presents unique possibilities for the visualization of audio descriptor data. These visualization tools can be applied to sound diffusion in the physical space of the concert hall using current spatialization technologies. Using CATART and the FTM&CO library for MAX/MSP we develop a technique for the organization of a navigation space for synthesis based on ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015